Latest producer tracking in an out-of-order processor, and applications thereof

ABSTRACT

A processor and system for latest producer tracking. In one embodiment, the processor includes an operand renamer circuit that includes a register rename map, a producer tracking circuit that includes a producer tracking map, and a results buffer allocater circuit that includes a results buffer free list. Control logic modifies in-register status values stored in the register rename map based on producer tracking status values stored in the producer tracking map. The producer tracking status values stored in the producer tracking map are modified based on buffer identification values output by the results buffer allocater circuit.

CROSS REFERENCE TO RELATED APPLICATION

This application is related to commonly owned U.S. patent applicationSer. No. ______, titled “Method For Latest Producer Tracking In AnOut-Of-Order Processor, And Applications Thereof,” filed on the same dayherewith (Attorney Docket No. 1778.2370001), which is incorporatedherein by reference in its entirety.

FIELD OF THE INVENTION

The present invention relates generally to processors and moreparticularly to processors having an out-of-order execution pipeline.

BACKGROUND OF THE INVENTION

Reduced Instruction Set Computer (RISC) processors are well known. RISCprocessors have instructions that facilitate the use of a techniqueknown as pipelining. Pipelining enables a processor to work on differentsteps of an instruction at the same time and thereby take advantage ofparallelism that exists among the steps needed to execute aninstruction. As a result, a processor can execute more instructions in ashorter period of time. Additionally, modern Complex Instruction SetComputer (CISC) processors often translate their instructions intomicro-operations (i.e., instructions similar to those of a RISCprocessor) prior to execution to facilitate pipelining.

Many pipelined processors, especially those used in the embedded market,are relatively simple in-order machines. As a result, they are subjectto control, structural, and data hazard stalls. More complex processorshave out-of-order execution pipelines. These more complex processors,often referred to as out-of-order processors, schedule execution ofinstructions around hazards that would stall an in-order machine.

Register renaming is a technique used by out-of-order processors toavoid unnecessary serialization of program operations imposed by thereuse of logical registers. In a conventional out-of-order processor,register renaming is implemented using a custom content-addressablememory (CAM) that holds a register map. The register map identifiesassociations formed between physical registers and logical registers.The CAM register map is searched, for example, during instruction decodeand dispatch operations to identify physical registers that hold thelatest results for source logical registers specified by an instruction.

In a conventional out-of-order processor, other register statusinformation such as, for example, information that indicates whetherregister data is available in a register file or off a bypass is alsomaintained in a custom CAM. While custom CAMs and conventionalout-of-order processing techniques work for their intended purposes,they are costly to implement in terms of chip area, power consumption,and processing speed. As a result, especially in the embedded market,the number of applications in which a conventional out-of-orderprocessor may be used is restricted.

What are needed are new techniques for implementing out-of-orderprocessing that overcome the limitations associated with conventionaltechniques.

BRIEF SUMMARY OF THE INVENTION

The present invention provides a processor and system for latestproducer tracking, and applications thereof. In one embodiment, theprocessor includes an operand renamer circuit that includes a registerrename map, a producer tracking circuit that includes a producertracking map, and a results buffer allocater circuit that includes aresults buffer free list.

The register rename map associates particular physical registers of aresults buffer with particular logical/architectural state registers ofa register file. The register rename map is indexed using registeridentification (RID) values. Each RID value represents alogical/architectural state register of the register file. The registerrename map stores buffer identification (BID) values and in-register(INR) status values. Each BID value represents a physical register of aresults buffer. The INR values are used to determine whether particulardata values are available in a logical/architectural state register ofthe register file or in a physical register of the results buffer.

The producer tracking map stores producer tracking status values. Thesestatus values are used to identify which physical registers of theresults buffer are being used by instructions to store the latest dataprior to the data being transferred to logical/architectural stateregisters of the register file. The producer tracking status valuesstored in the producer tracking map are modified in one embodiment byplacing BID values produced by the results buffer allocater circuit on aBID set bus or a BID clear bus of the producer tracking circuit.

The results buffer free list stores status values that identify whichphysical registers of the results buffer are available to store a valueproduced by an instruction. Instructions that produce values areassigned physical registers in which their results can be stored untilinstruction graduation. The function of the results buffer allocatercircuit is to keep track of physical register availability and to outputa BID value representing a physical register, which can be assigned toan instruction and used to store the value produced by the instruction.

Control logic modifies the INR status values stored in the registerrename map based on the producer tracking status values stored in theproducer tracking map.

Further embodiments, features, and advantages of the present invention,as well as the structure and operation of the various embodiments of thepresent invention, are described in detail below with reference to theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS/FIGURES

The accompanying drawings, which are incorporated herein and form a partof the specification, illustrate the present invention and, togetherwith the description, further serve to explain the principles of theinvention and to enable a person skilled in the pertinent art to makeand use the invention.

FIG. 1 is a diagram of a processor according to an embodiment of thepresent invention.

FIG. 2 is a more detailed diagram of the processor of FIG. 1.

FIG. 3 is a diagram illustrating a relationship between a producertracking map, a register rename map, and a results buffer free list of aprocessor according to an embodiment of the present invention.

FIG. 4 is a diagram that illustrates clearing a status bit of a producertracking map according to an embodiment of the present invention.

FIG. 5 is a diagram that illustrates setting a status bit of a producertracking map and updating a register rename map according to anembodiment of the present invention.

FIG. 6 is a diagram that illustrates updating a status bit of a registerrename map according to an embodiment of the present invention.

FIG. 7 is a diagram that illustrates operation of a processor accordingto an embodiment of the present invention.

FIG. 8 is a diagram of an example system embodiment of the presentinvention.

The present invention is described with reference to the accompanyingdrawings. The drawing in which an element first appears is typicallyindicated by the leftmost digit or digits in the corresponding referencenumber.

DETAILED DESCRIPTION OF THE INVENTION

The present invention provides a processor, system, and method forlatest producer tracking in a processor, and applications thereof. Inthe detailed description of the invention that follows, references to“one embodiment”, “an embodiment”, “an example embodiment”, etc.,indicate that the embodiment described may include a particular feature,structure, or characteristic, but every embodiment may not necessarilyinclude the particular feature, structure, or characteristic. Moreover,such phrases are not necessarily referring to the same embodiment.Further, when a particular feature, structure, or characteristic isdescribed in connection with an embodiment, it is submitted that it iswithin the knowledge of one skilled in the art to effect such feature,structure, or characteristic in connection with other embodimentswhether or not explicitly described.

FIG. 1 illustrates an example processor 100 according to an embodimentof the present invention. As shown in FIG. 1, processor 100 includes aninstruction fetch unit 102, an instruction cache 104, an instructiondecode and dispatch unit 106, one or more instruction execution unit(s)108, a data cache 110, an instruction graduation unit 112, a registerfile 114, and a bus interface unit 116. Processor 100 is preferablycapable of implementing multi-threading, but need not implementmulti-threading. As used herein, multi-threading refers to an ability ofan operating system and a processor to execute different parts of aprogram, called threads, simultaneously.

Instruction fetch unit 102 retrieves instructions from instruction cache104 and provides instructions to instruction decode and dispatch unit106. Instructions are retrieved in program order, for example, for oneor more program threads. In one embodiment, instruction fetch unit 102includes logic for recoding compressed format instructions to a formatthat can be decoded and executed by processor 100. In one embodiment,instruction fetch unit 102 includes an instruction buffer that enablesinstruction fetch unit 102 to hold multiple instructions for multipleprogram threads, which are ready for decoding, and to issue more thanone instruction at a time to instruction decode and dispatch unit 106.

Instruction cache 104 is an on-chip memory array organized as a directassociative or multi-way set associative cache such as, for example, a2-way set associative cache or a 4-way set associative cache. In oneembodiment, instruction cache 104 is virtually indexed and physicallytagged, thereby allowing virtual-to-physical address translations tooccur in parallel with cache accesses. Instruction cache 104 interfaceswith instruction fetch unit 102.

Instruction decode and dispatch unit 106 receives one or moreinstructions at a time from instruction fetch unit 102 and decodes themprior to execution. In one embodiment, instruction decode and dispatchunit 106 receives at least one instruction for each program thread beingimplemented during a particular clock cycle. As described herein, thenumber of program threads being implemented at any given point in timeis variable. Decoded instructions are stored in a decoded instructionbuffer and issued to instruction execution unit(s) 108, for example,after it is determined that selected operands are available.Instructions can be dispatched from instruction decode and dispatch unit106 to instruction execution unit(s) 108 out-of-program-order.

Instruction execution unit(s) 108 execute instructions dispatched byinstruction decode and dispatch unit 106. In one embodiment, at leastone instruction execution unit 108 implements a load-store (RISC)architecture with single-cycle arithmetic logic unit operations (e.g.,logical, shift, add, subtract, etc.). Other instruction executionunit(s) 108 can include, for example, a floating point unit, amultiple-divide unit and/or other special purpose co-processing units.In embodiments having multiple instruction execution units 108, one ormore of the units can be implemented, for example, to operate inparallel. Instruction execution unit(s) 108 interface with data cache110, register file 114, and a results buffer (not shown).

Data cache 110 is an on-chip memory array. Data cache 110 is preferablyvirtually indexed and physically tagged. Data cache 110 interfaces withinstruction execution unit(s) 108.

Register file 114 represents a plurality of general purpose registers(e.g., logical/architectural state register), which are visible to aprogrammer. Each general purpose register is a 32-bit or a 64-bitregister, for example, used for logical and/or mathematical operationsand address calculations. In one embodiment, register file 114 is partof instruction execution unit(s) 108. Optionally, one or more additionalregister file sets (not shown), such as shadow register file sets, canbe included to minimize content switching overhead, for example, duringinterrupt and/or exception processing.

Bus interface unit 116 controls external interface signals for processor100. In one embodiment, bus interface unit 116 includes a collapsingwrite buffer used to merge write-through transactions and gather writesfrom uncached stores. Processor 100 can include other features, and thusit is not limited to having just the specific features described herein.

FIG. 2 is a more detailed diagram of processor 100. As illustrated inFIG. 2, processor 100 performs four basic functions: instruction fetch;instruction decode and dispatch; instruction execution; and instructiongraduation. These four basic functions are illustrative and not intendedto limit the present invention.

Instruction fetch (represented in FIG. 1 by instruction fetch unit 102)begins when a PC selector 202 selects amongst a variety of programcounter values and determines a value that is used to fetch aninstruction from instruction cache 104. In one embodiment, the programcounter value selected is the program counter value of a new programthread, the next sequential program counter value for an existingprogram thread, or a redirect program counter value associated with abranch instruction or a jump instruction. After each instruction isfetched, PC selector 202 selects a new value for the next instruction tobe fetched.

During instruction fetch, tags associated with an instruction to befetched from instruction cache 104 are checked. In one embodiment, thetags contain precode bits for each instruction indicating instructiontype. If these precode bits indicate that an instruction is a controltransfer instruction, a branch history table is accessed and used todetermine whether the control transfer instruction is likely to branchor likely not to branch.

In one embodiment, any compressed-format instructions that are fetchedare recoded by an optional instruction recoder 204 into a format thatcan be decoded and executed by processor 100. For example, in oneembodiment in which processor 100 implements both 16-bit instructionsand 32-bit instructions, any 16-bit compressed-format instructions arerecoded by instruction recoder 204 to form instructions having 32 bits.In another embodiment, instruction recoder 204 recodes both 16-bitinstructions and 32-bit instructions to a format having more than 32bits.

After optional recoding, instructions are written to an instructionbuffer 206. In one embodiment, this stage can be bypassed andinstructions can be dispatched directly to instruction decoder 208.

Instruction decode and dispatch (represented in FIG. 1 by instructiondecode and dispatch unit 106) begins, for example, when one or moreinstructions are received from instruction buffer 206 and decoded by aninstruction decoder 208. In one embodiment, following resolution of abranch misprediction, the ability to receive instructions frominstruction buffer 206 may be temporarily halted until selectedinstructions residing within the instruction execution portion and/orinstruction graduation portion of processor 100 are purged.

In parallel with instruction decoding, operands are renamed. Registerrename map(s) located within instruction identification (ID) generatorand operand renamer 210 are updated and used to determine whetherrequired source operands are available, for example, in register file114 and/or a results buffer 218. A register rename map is a structurethat holds the mapping information between programmer visiblearchitectural state registers and internal physical registers ofprocessor 100. Register rename map(s) indicate whether data is availableand where data is available. As will be understood by persons skilled inthe relevant arts given the description herein, register renaming isused to remove instruction output dependencies and to ensure that thereis a single producer of a given register in processor 100 at any giventime. Source registers are renamed so that data is obtained from aproducer at the earliest opportunity instead of waiting for theprocessor's architectural state to be updated. In parallel withinstruction decoding, instruction ID generator and operand renamer 210generates and assigns an instruction ID tag to each instruction. Aninstruction ID tag assigned to an instruction is used, for example, todetermine the program order of the instruction relative to otherinstructions. In one embodiment, each instruction ID tag is athread-specific sequentially generated value that uniquely determinesthe program order of instructions. The instruction ID tags can be usedto facilitate graduating instructions in-program-order, which wereexecuted out-of-program-order.

Each decoded instruction is assigned a results buffer identificationvalue or tag by a results buffer allocater 212. The results bufferidentification value determines the location in results buffer 218(e.g., a physical register) where instruction execution unit(s) 108 canwrite calculated results for an instruction. In one embodiment, theassignment of results buffer identification values are accomplishedusing a free list. The free list contains as many entries as the numberof entries (e.g., physical registers) that make up results buffer 218.The free list can be implemented, for example, using a bitmap. A firstbit of the bitmap can be used to indicate whether the results bufferentry is either available (e.g., if the bit has a value of one) orunavailable (e.g., if the bit has a value of zero).

Assigned results buffer identification values are written into agraduation buffer 224. In one embodiment, results buffer completion bitsassociated with newly renamed instructions are reset/cleared to indicateincomplete results. As instructions complete execution, theircorresponding results buffer completion bits are set, thereby enablingthe instructions to graduate and release their associated results bufferidentification values. In one embodiment, control logic (not shown)ensures that one program thread does not consume more than its share ofresults buffer entries.

Decoded instructions are written to a decoded instruction buffer 214. Aninstruction dispatcher 216 selects instructions residing in decodedinstruction buffer 214 for dispatch to execution unit(s) 108. Inembodiments, instructions can be dispatched for executionout-of-program-order. In one embodiment, instructions are selected anddispatched, for example, based on their age (ID tags) assuming thattheir operands are determined to be ready.

Instruction execution unit(s) 108 execute instructions as they aredispatched. During execution, operand data is obtained as appropriatefrom data cache 110, register file 114, and/or results buffer 218. Amultiplexer 215 and/or comparators (not shown) can be used to selectdata from results buffer 218 or register file 114. A result calculatedby instruction execution unit(s) 108 for a particular instruction iswritten to a location/entry of results buffer 218 specified by theinstruction's associated results buffer identification value.

Instruction graduation (represented in FIG. 1 by instruction graduationunit 112) is controlled by a graduation controller 220. Graduationcontroller 220 graduates instructions in accordance with the resultsbuffer identification values stored in graduation buffer 224. When aninstruction graduates, its associated result is transferred from resultsbuffer 218 to register file 114. In conjunction with instructiongraduation, graduation controller 220 updates, for example, the freelist of results buffer allocater 212 to indicate a change inavailability status of the graduating instruction's assigned resultsbuffer identification value.

As illustrated in FIG. 3, in embodiments of the present invention,processor 100 includes a producer tracking circuit 302, an operandrenamer circuit 305, and a results buffer allocater circuit 307, whichare interconnected and used to keep track of instructions that are thelatest producers of values to be stored in particular registers ofregister file 114.

Producer tracking circuit 302 includes a producer tracking map 303.Producer tracking map 303 stores producer tracking status values thatare used to identify which physical registers of results buffer 218 arebeing used by instructions to store the latest data for particularlogical register. As shown in FIG. 3, the “1” bits stored in producertracking map 303 for physical registers B1, B3, and B5 indicate that theinstructions writing their data to physical registers B1, B3, and B5 arethe latest producers of particular data values associated withparticular logical registers.

In an embodiment, producer tracking map 303 stores “N” one-bit producertracking status values, where “N” is the number of physicals registersof results buffer 218. The “N” one-bit values are indexed using bufferidentification (BID) values associated with the physical registers ofresults buffer 218. When a BID value is placed on an address bus ofproducer tracking circuit 302, a producer tracking status valuecorresponding to the BID value is output at a read data bus of producertracking circuit 302. As shown in FIG. 3, this producer tracking statusvalue is provided to in-register status value set/clear (INR SET/CLR)logic 304. Although INR SET/CLR logic 304 is illustrated as beingseparate from producer tracking circuit 302 and operand renamer circuit305, it is to be understood that INR SET/CLR logic 304 can beimplemented, for example, as a part of producer tracking circuit 302and/or a part of operand renamer circuit 305. Particular producertracking status values stored in producer tracking map 303 can bemodified (e.g., set or cleared) by placing a BID value on a BID set busor a BID clear bus.

Operand renamer circuit 305 includes a register rename map 306. Registerrename map 306 associates particular physical registers of resultsbuffer 218 with particular logical/architectural state registers ofregister file 114.

In an embodiment, register rename map 306 is indexed using registeridentification (RID) values. Each RID value represents one of thelogical/architectural state registers of register file 114. As shown inFIG. 3, register rename map 306 stores, for example, for each RID indexvalue, a buffer identification (BID) value and an in-register (INR)status value. Each BID value represents a physical register of resultsbuffer 218. The INR values are used to determine whether particular datavalues are available in a logical/architectural state register ofregister file 114 or in a physical register of results buffer 218. Othervalues such as, for example, a data availability (AVAIL) status valuecan also be stored as part of register rename map 306 and indexed by RIDvalues. An AVAIL status value can be used, for example, to identifywhether an instruction can be dispatched.

In an embodiment, operand renamer circuit 305 preferably has two readaddress busses, two read data busses, a write address bus, and a writedata bus, as illustrated in FIG. 3. Operand renamer circuit 305 also hasan old BID value read bus that is coupled to the BID clear bus ofproducer tracking circuit 302.

Results buffer allocater circuit 307 includes a results buffer free list308. Results buffer free list 308 stores status values that identifywhich physical registers of results buffer 218 are available to store avalue produced by an instruction. In an embodiment, results buffer freelist 308 stores “N” one-bit status values, where “N” is the number ofphysicals registers of results buffer 218.

In the instruction decode and dispatch portion of the pipeline ofprocessor 100, instructions that produce values are assigned physicalregisters in which their results can be stored until instructiongraduation. The function of results buffer allocater circuit 307 is tooutput a BID value representing a physical register, which can beassigned to an instruction and used to store the value produced by theinstruction. As shown in FIG. 3, the BID value output by results bufferallocater circuit 307 is provided to producer tracking circuit 302 andto operand renamer circuit 305.

The operations and interactions of producer tracking circuit 302,operand renamer circuit 305, and results buffer allocater circuit 307,as they relate to tracking an instruction (e.g., an ADD instruction)that is the latest producer of a data value associated with a particularlogical register, will now be described in detail with references toFIGS. 3-6.

Referring to FIG. 3, in an embodiment of the present invention, aninstruction to be decoded is stored in a processor pipeline register310. This occurs in the instruction decode and dispatch portion of thepipeline of processor 100 (see FIGS. 1 and 2). Each instruction to bedecoded may potentially include a first group of bits 312 that specify afirst logical/architectural state register (Source 1), a second group ofbits 314 that specify a second logical/architectural state register(Source 2), and/or a third group of bits 316 that specify a thirdlogical/architectural state register (Destination). These groups ofbits, if present, are provided to operand renamer circuit 305. As anexample, consider an ADD instruction such as “ADD (R3, R1, R2),” whichimplements “R3=R1+R2.” Such an instruction includes bits that identifythe Source 1 register as register R1, the Source 2 register as registerR2, and the Destination register as register R3.

Continuing further with the example ADD instruction noted above, asshown in FIG. 3, the bits 312 of the example ADD instruction, whichrepresent register R1, are provided to a first read address bus ofoperand renamer circuit 305. Bits 312 are used as an index into registerrename map 306. As shown in FIG. 3, the bits representing register R1index BID bits stored in register rename map 306 that represent aphysical register B3 of results buffer 218. As a result of the bits 312being placed on the first read address bus, the bits representingphysical register B3 are placed on a first read data bus of operandrenamer circuit 305. The bits representing physical register B3 are thenstored as bits 322 in a second processor pipeline register 320 in asubsequent clock cycle of processor 100.

The bits 314 of the example ADD instruction, which represent registerR2, are provided to a second read address bus of operand renamer circuit305. Bits 314 are also used as an index into register rename map 306. Asshown in FIG. 3, the bits representing register R2 index BID bits storedin register rename map 306 that represent a physical register B8 ofresults buffer 218. As a result of the bits 314 being placed on thesecond read address bus, the bits representing physical register B8 areplaced on a second read data bus of operand renamer circuit 305. Thebits representing physical register B8 are then stored as bits 324 inthe second pipeline register 320 in a subsequent clock cycle ofprocessor 100.

The bits 316 of the example ADD instruction, which represent registerR3, are provided to a write address bus of operand renamer circuit 305.Bits 316 are used as an index into register rename map 306. As shown inFIG. 3, the bits representing register R3 index BID bits stored inregister rename map 306 that represent a physical register B5 of resultsbuffer 218. As a result of the bits 316 being placed on the writeaddress bus, the bits representing physical register B5 are placed on anold BID bus of operand renamer circuit 305. This feature of the presentinvention is illustrated in more detail in FIG. 4.

FIG. 4 is a diagram that illustrates the clearing of a producer trackingstatus bit of producer tracking map 303 according to an embodiment ofthe present invention. As shown in FIG. 4, bits 316 act as an index 402into register rename map 306 of operand renamer circuit 305. Index 402points to a location 404 of register rename map 306.

As shown in FIG. 4, location 404 stores bits that represent physicalregister B5 of results buffer 218. The association of physical registerB5 with logical register R3, together with the producer tacking statusvalue stored at location 406 of producer tracking map 303, indicatesthat prior to the example ADD instruction noted above, physical registerB5 was the physical register used by an instruction that was the latestproducer of a particular data value associated with logical register R3.Because this will no longer be the case (i.e., the example ADDinstruction will be the latest producer), the producer tracking statusvalue (i.e., the “1” bit stored at location 406 of producer tracking map303) must be cleared. In an embodiment, clearing the “1” bit in location406 is accomplished by placing the bits representing physical registerB5 on the BID clear bus of producer tracking circuit 302.

In parallel with clearing the bit at location 406 of producer trackingmap 303, a physical register “B1” is allocated by results bufferallocater circuit 307 to hold the result of the example ADD instructionuntil the example ADD instruction graduates. As shown in FIG. 3, bitsrepresenting physical register B5 are provided by results bufferallocater circuit 307 to pipeline buffer 320, operand renamer circuit305, and producer tracking circuit 302. As shown in FIG. 3, these bitsare stored in pipeline register 320 as bits 326.

FIG. 5 is a diagram that illustrates how a BID value output by resultsbuffer allocater circuit 307 is used to set a producer tracking statusbit of producer tracking map 303 and to update register rename map 306according to an embodiment of the present invention. As noted herein,the function of results buffer allocater circuit 307 and results bufferfree list 308 is to identify which physical registers of the resultsbuffer are available to store a value produced by an instruction and toallocate an available physical register to an instruction that producesa value, for example, during instruction decode.

In the embodiment shown in FIG. 5, the example ADD instruction includesbits 316, which indicate that the resultant value of the ADD instructionis to be written to logical/architectural state register R3. Before thishappens, however, the resultant value will first be temporarily storedin a physical register of results buffer 218. Thus, results bufferallocater circuit 307 must identify a physical register that isavailable and communicate this information to operand renamer circuit305.

As illustrated in FIG. 5, results buffer free list 308 show a “1 bit”associated with physical register B0. In an embodiment, the “1 bit”indicates that physical register B0 is currently assigned to aninstruction that has not yet graduated, and thus physical register B0 isunavailable. Results buffer free list 308 show a “0 bit” associated withphysical register B1. The “0 bit” indicates physical register B1 isavailable, and thus results buffer allocater circuit 307 outputs bits(e.g., a BID value) that are used to represent/identify physicalregister B1. The “0 bit” associated with physical register B1 is thenset to “1” to indicate that physical register B1 is no longer available.

The BID value output by results buffer allocater circuit 307 iscommunicated to producer tracking circuit 302. In an embodiment, the BIDvalue is placed on a BID set bus, which causes a bit stored at location502 in producer tracking map 303 to be set to a value of one. A value ofone indicates that the instruction associated with physical register B1is the latest producer of a data value (e.g., the resultant value of theADD instruction that will be written to register R3 of register file 114upon graduation of the ADD instruction). A value of zero stored inproducer tracking map 303, which is indexed for example by a BID valuerepresenting physical register B0, indicates that any instructionassociated with physical register B0 is not the latest producer of avalue. In an embodiment, during a cold reset of processor 100, all ofthe producer tracking status values of producer tracking map 303 arereset to zero.

The BID value output by results buffer allocater circuit 307 is alsocommunicated to operand renamer circuit 305. As shown in FIG. 5, in anembodiment, the BID value is placed on a write data bus and written tolocation 404 of register rename map 305. Location 404 is selected forstoring the BID value as a result of placing bits 316 on the writeaddress bus of operand renamer circuit 305. The INR bit stored atlocation 406 of register rename map 306 is reset to zero to indicatethat the resultant value of the ADD instruction is not yet available inregister R3 of register file 114. As explained below, the zero bitstored at location 406 may or may not be set to one when the ADDinstruction graduates.

FIG. 6 is a diagram that illustrates the updating of an INR status bitof register rename map 306 upon graduation of an instruction accordingto an embodiment of the present invention. In an out-of-order processor,instructions may be executed out-of-program-order, but all instructionsgraduate and update the architectural state of the processorin-program-order. This is accomplished, for example, by transferringdata associated with instructions that are graduating in-program-orderfrom the physical registers of results buffer 218 to thelogical/architectural state registers of register file 114 as theinstructions graduate.

When an instruction graduates, both the physical register and thelogical/architectural state register associated with a producerinstruction are known. Otherwise, the value produced by the instructioncould not be transferred from the physical register of the resultsbuffer to the logical/architectural state register of the register file.This information is shown in FIG. 6 as BID value 602 and RID value 604.During graduation of an instruction, BID value 602 is placed on a BIDread bus of producer tracking circuit 302, and RID value 604 is placedon an RID read bus of operand renamer circuit 305.

As shown in FIG. 6, when BID value 602 is placed on the BID read bus ofproducer tracking circuit 302, a producer tracking status value storedin producer tracking map 303 (e.g., at location 502) associated with BIDvalue 602 is provided to IR SET/CLR logic 304. If the producer trackingvalue provided to INR SET/CLR logic 304 indicates that the graduatinginstruction is the latest producer of a data value, INR SET/CLR logic304 modifies the INR status value (e.g., at location 406) in registerrename map 306 indexed by RID value 604 (e.g., index 402) to indicatethat the data value is now available in register file 114 (e.g., inregister R3). If the producer tracking value provided to BIR SET/CLRlogic 304, however, indicates that the instruction associated with thegraduating instruction is not the latest producer of a data value, theINR status value is not modified to indicate that the data value isavailable in register file 114. This feature of the present invention isdescribed further below with reference to FIG. 7.

FIG. 7 is a diagram that further illustrates operation of processor 100according to an embodiment of the present invention. FIG. 7 shows howvalues stored in register rename map 306 and producer tracking map 303are modified based on two ADD instructions over six time periods. Thetime periods each represent multiple processor clock cycles, as can beunderstood by comparing for example FIG. 2 and FIG. 7, in order tosimplify the description.

As shown in FIG. 7, at a time period To, a first ADD instruction (ADD-1)is fetched from instruction cache 104. Instruction ADD-1 implements thefunction “R3=R1+R2”. As shown in register rename map 306, at time periodT₀, operand R1 is available register R1 of register file 114 (i.e., theINR status bit equals one). Operand R2 is also available in register R2of register file 114 (i.e., the INR status bit equals one). The nextphysical register available to store the resultant value produced byinstruction ADD-1 is assumed to be physical register B1. None of thephysical registers illustrated in producer tracking map 303 are beingused by an instruction that is the latest producer of a data value(i.e., all of the shown producer tracking status values are zero).

At a time period T1, instruction ADD-1 is undergoing decoding andoperand renaming. As shown in the register rename map of FIG. 7, fortime period T₁, no operand renaming is required for source 1 and source2 of the instruction.

During time period T₁, results buffer allocater circuit 307 allocatesphysical register B1 to store the resultant value of instruction ADD-1until such time as instruction ADD-1 graduates. How this is accomplishedis shown, for example, in FIG. 5. The BID value associated with physicalregister B1 replaces the BID value associated with physical register B12in register rename map 306. The appropriate INR status value is alsoreset to zero. In addition, the producer tracking status value forphysical register B1 in producer tracking map 303 is set to one toindicate the instruction writing to physical register B1 is the latestproducer of the value that will be written to register R3.

At a time period T₂, instruction ADD-1 is executed and generates aresultant value that is stored in physical register B1. Also during timeperiod T₂, a second ADD instruction (ADD-2) is fetched from instructioncache 104. Instruction ADD-2 implements the function “R3=R1+R4”. Asshown in register rename map 306, operand R1 and operand R2 areavailable in register R1 and register R2, respectfully, of register file114. The next physical register available to store the resultant valueproduced by instruction ADD-2 is assumed to be physical register B4.

During a time period T₃, instruction ADD-2 is undergoing decoding andoperand renaming. Results buffer allocater circuit 307 allocatesphysical register B4 to store the resultant value of instruction ADD-2until such time as instruction ADD-2 graduates. As a result, the BIDvalue associated with physical register B4 replaces the BID valueassociated with physical register B1 in register rename map 306. Theappropriate INR status value is again reset to zero. Furthermore, theproducer tracking status value for physical register B1 in producertracking map 303 is set to zero, and the producer tracking status valuefor physical register B4 is set to one to indicate that the instructionwriting to physical register B4 (i.e., ADD-2) is now the latest producerof the value that will be written to register R3.

As shown in FIG. 7, instruction ADD-1 graduates during time period T₃.During this period of time, the data value stored in physical registerB1 is written to logical/architectural register R3. Also, in anembodiment of the present invention, as illustrated for example in FIG.6, the BID value “B1” is placed on the BID read bus of producer trackingcircuit 302. This causes the producer tracking status value “0” that isstored in producer tracking map 303 and associated with BID value “B1”to be provided to INR SET/CLR logic 304. Because the producer trackingstatus value “0” provided to INR SET/CLR logic 304 indicates that thegraduating instruction (ADD-1) is not the latest producer of the datavalue that will be written to register R3, INR SET/CLR logic 304 doesnot modify the INR status value for register R3 in register rename map306 to indicate that the data value is available in register file 114.If this were to occur, subsequent instructions would use the valuestored in register R3 rather than the latest value stored in physicalregister B4 by instruction ADD-2.

During a time period T₄, instruction ADD-2 is executed and generates aresultant value that is stored in physical register B4.

During a time period T₅, instruction ADD-2 graduates. During this periodof time, the data value stored in physical register B4 is written tological/architectural register R3. This is accomplished as shown, forexample, in FIG. 6. As described herein, the BID value “B4” is placed onthe BID read bus of producer tracking circuit 302. This causes theproducer tracking status value “1” that is stored in producer trackingmap 303 and associated with BID value “B4” to be provided to INR SET/CLRlogic 304. Because the producer tracking status value “1” provided toINR SET/CLR logic 304 indicates that the graduating instruction (ADD-2)is the latest producer of the data value that will be written toregister R3, INR SET/CLR logic 304 modifies the INR status value forregister R3 in register rename map 306 to indicate that the data valueis now available in register file 114. In addition, the producertracking status value for physical register B4 in producer tracking map303 is set to zero.

FIG. 8 is a diagram of an example system 800 according to an embodimentof the present invention. System 800 includes a processor 802, a memory804, an input/output (I/O) controller 806, a clock 808, and customhardware 810. In an embodiment, system 800 is an application specificintegrated circuit (ASIC) or a system on a chip (SOC).

Processor 802 is any processor that includes features of the presentinvention described herein and/or implements a method embodiment of thepresent invention. In one embodiment, processor 802 includes aninstruction fetch unit, an instruction cache, an instruction decode anddispatch unit, one or more instruction execution unit(s), a data cache,an instruction graduation unit, a register file, and a bus interfaceunit similar to processor 100 described above.

Memory 804 can be any memory capable of storing instructions and/ordata. Memory 804 can include, for example, random access memory and/orread-only memory.

Input/output (I/O) controller 806 is used to enable components of system800 to receive and/or send information to peripheral devices. I/Ocontroller 806 can include, for example, an analog-to-digital converterand/or a digital-to-analog converter.

Clock 808 is used to determine when sequential subsystems of system 800change state. For example, each time a clock signal of clock 808 ticks,state registers of system 800 capture signals generated by combinatoriallogic. In an embodiment, the clock signal of clock 808 can be varied.The clock signal can also be divided, for example, before it is providedto selected components of system 800.

Custom hardware 810 is any hardware added to system 800 to tailor system800 to a specific application. Custom hardware 810 can include, forexample, hardware needed to decode audio and/or video signals,accelerate graphics operations, and/or implement a smart sensor. Personsskilled in the relevant arts will understand how to implement customhardware 810 to tailor system 800 to a specific application.

While various embodiments of the present invention have been describedabove, it should be understood that they have been presented by way ofexample, and not limitation. It will be apparent to persons skilled inthe relevant computer arts that various changes in form and detail canbe made therein without departing from the spirit and scope of theinvention. Furthermore, it should be appreciated that the detaileddescription of the present invention provided herein, and not thesummary and abstract sections, is intended to be used to interpret theclaims. The summary and abstract sections may set forth one or more butnot all exemplary embodiments of the present invention as contemplatedby the inventors.

For example, in addition to implementations using hardware (e.g., withinor coupled to a Central Processing Unit (“CPU”), microprocessor,microcontroller, digital signal processor, processor core, System onChip (“SOC”), or any other programmable or electronic device),implementations may also be embodied in software (e.g., computerreadable code, program code, instructions and/or data disposed in anyform, such as source, object or machine language) disposed, for example,in a computer usable (e.g., readable) medium configured to store thesoftware. Such software can enable, for example, the function,fabrication, modeling, simulation, description, and/or testing of theapparatus and methods described herein. For example, this can beaccomplished through the use of general programming languages (e.g., C,C++), GDSII databases, hardware description languages (HDL) includingVerilog HDL, VHDL, SystemC Register Transfer Level (RTL) and so on, orother available programs, databases, and/or circuit (i.e., schematic)capture tools. Such software can be disposed in any known computerusable medium including semiconductor, magnetic disk, optical disk(e.g., CD-ROM, DVD-ROM, etc.) and as a computer data signal embodied ina computer usable (e.g., readable) transmission medium (e.g., carrierwave or any other medium including digital, optical, or analog-basedmedium). As such, the software can be transmitted over communicationnetworks including the Internet and intranets.

It is understood that the apparatus and method embodiments describedherein may be included in a semiconductor intellectual property core,such as a microprocessor core (e.g., embodied in HDL) and transformed tohardware in the production of integrated circuits. Additionally, theapparatus and methods described herein may be embodied as a combinationof hardware and software. Thus, the present invention should not belimited by any of the above-described exemplary embodiments, but shouldbe defined only in accordance with the following claims and theirequivalence.

1. A processor, comprising: an operand renamer circuit that stores aplurality of in-register status values, wherein each in-register statusvalue is associated with a particular architectural state register; aproducer tracking circuit that stores a plurality of producer trackingstatus values, wherein each producer tracking status value is associatedwith a particular physical register; and control logic that modifies thein-register status values based on the producer tracking status values.2. The processor of claim 1, wherein the operand renamer circuitincludes a register rename map.
 3. The processor of claim 2, whereinvalues stored in the register rename map are accessed usingarchitectural state register identification values.
 4. The processor ofclaim 1, wherein the producer tracking circuit includes a producertracking map.
 5. The processor of claim 4, wherein values stored in theproducer tracking map are accessed using physical registeridentification values.
 6. The processor of claim 1, wherein eachproducer tracking status value consists of a single bit.
 7. Theprocessor of claim 1, wherein the operand renamer circuit sends a signalto the producer tracking circuit to modify a particular producertracking status value.
 8. The processor of claim 1, wherein theprocessor further comprises a results buffer allocater circuit, and theresults buffer allocater circuit sends a signal to the producer trackingcircuit to modify a particular producer tracking status value.
 9. Asystem, comprising: a processor that includes an operand renamer circuitthat stores a plurality of in-register status values, wherein eachin-register status value is associated with a particular architecturalstate register, a producer tracking circuit that stores a plurality ofproducer tracking status values, wherein each producer tracking statusvalue is associated with a particular physical register, and controllogic that modifies the in-register status values based on the producertracking status values; and memory coupled to the processor.
 10. Thesystem of claim 9, wherein the operand renamer circuit includes aregister rename map.
 11. The system of claim 9, wherein the producertracking circuit includes a producer tracking map.
 12. The system ofclaim 9, wherein the operand renamer circuit sends a signal to theproducer tracking circuit to modify a particular producer trackingstatus value.
 13. The system of claim 9, wherein the processor furthercomprises a results buffer allocater circuit, and the results bufferallocater circuit sends a signal to the producer tracking circuit tomodify a particular producer tracking status value.
 14. A tangiblecomputer readable storage medium comprising a processor embodied insoftware, the processor comprising: an operand renamer circuit thatstores a plurality of in-register status values, wherein eachin-register status value is associated with a particular architecturalstate register; a producer tracking circuit that stores a plurality ofproducer tracking status values, wherein each producer tracking statusvalue is associated with a particular physical register; and controllogic that modifies the in-register status values based on the producertracking status values.
 15. The tangible computer readable storagemedium of claim 14, wherein the operand renamer circuit includes aregister rename map.
 16. The tangible computer readable storage mediumof claim 14, wherein the producer tracking circuit includes a producertracking map.
 17. The tangible computer readable storage medium of claim14, wherein the operand renamer circuit sends a signal to the producertracking circuit to modify a particular producer tracking status value.18. The tangible computer readable storage medium of claim 14, whereinthe processor further comprises a results buffer allocater circuit, andthe results buffer allocater circuit sends a signal to the producertracking circuit to modify a particular producer tracking status value.19. The computer readable storage medium of claim 14, wherein theprocessor is embodied in hardware description language software.
 20. Thecomputer readable storage medium of claim 19, wherein the processor isembodied in one of Verilog hardware description language software andVHDL hardware description language software.